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Product distribution theory is a new collective intelligence-based framework for 
analyzing and controlling distributed systems. Its usefulness in distributed stochastic 
optimization is illustrated here through an airline fleet assignment problem. This 
problem involves the allocation of aircraft to a set of flights legs in order to meet 
passenger demand, while satisfying a variety of linear and non-linear constraints. Over 
the course of the day, the routing of each aircraft is determined in order to minimize 
the number of required flights for a given fleet. The associated flow continuity and 
aircraft count constraints have led researchers to focus on obtaining quasi-opt imal 
solutions, especially at larger scales. In this paper, the authors propose the application 
of this new stochastic optimization algorithm to a non-linear objective “cold start” 
fleet assignment problem. Results show that the optimizer can successfully solve such 
highly-constrained problems (130 variables, 184 constraints). 


Introduction 

S CHEDULE development, a crucial aspect of profitable 
airline management, involves many steps, including 
schedule design, fleet assignment, aircraft routing, and crew 
pairing. 1 In this project, we assume that schedule design 
has been finalized; the focus is on fleet assignment , that is 
the assignment of available aircraft to the scheduled flights, 
and on aircraft routing , the sequence of flights to be flown by 
each aircraft throughout the day (Figure 1). Typical fleet 
assignment objectives include minimizing assignment cost 
or maximizing the profit from each flight. In our case, the 
objective is to meet the passenger demand throughout the 
day with the lowest total landing and takeoff (LTO) costs. 

Fleet assignment problems can be classified as either 
“warm start” , in which case an existing assignment is used 
as a starting point, or “cold start”, in which only the fleet 
size, aircraft types, and passenger demand are known. 2 
Fleet assignment and aircraft routing problems have been 
solved using various optimization methods, including integer 
linear programming, 3,4 neighborhood search, 5 and genetic 
algorithms. 6 

An alternate approach pursued here is to distribute the 
optimization among agents that represent, for example, 
members of the fleet or the airports in the network. Formu- 
lating the problem as a distributed stochastic optimization 
allows for the application of techniques from machine learn- 
ing, statistics, multi-agent systems, and game theory. The 
current work leverages these fields by applying a Collective 
Intelligence (COIN) technique, Product Distribution (PD) 
theory, to a sample fleet assignment problem. Typically 
in stochastic optimization approaches probability distribu- 
tions are used to help search for a point in the variable 
space which optimizes the objective function. In contrast, 
in the PD approach the search is for a probability distribu- 
tion across the variable space that optimizes an associated 
Lagrangian. Since the probability distribution is a vector 
in a Euclidean space, the search can be done via gradient 
based methods even if the variable space is categorical. Sim- 
ilar techniques have been successfully applied to a variety of 
distributed optimization problems including network rout- 
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Fig. 2 The 9-airport, 20-arc problem. 


ing, computing resource allocation, and data collection by 
autonomous rovers. 7-9 

The next section of the paper details the formulation of 
the optimization problem. This is followed by a description 
of the COIN and PD theory framework. Finally, results 
from an example fleet assignment problem axe presented. 
These results validate the predictions of this theory, and 
indicate its usefulness as a general purpose technique for 
distributed solutions of constrained optimization problems. 

Problem Statement 

The objective is to determine the aircraft routing and resi- 
dent fleet size at each airport that minimizes the landing and 
takeoff fees levied by airports while meeting demand. The 
9-airport, 20-flight directed axe sample problem (Figure 2) 
is used to demonstrate the performance of the approach. 
The passenger demand on each arc is given as a function of 
time (determined as part of the schedule design). The day 
is split into six 4-hour segments. It is assumed that each 
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Pax Capacity w 

Cost Factor F(w) 

100 

1.0 

200 

1.5 

300 

2.0 


Table 1 LTO cost factor as a function of aircraft pas- 
senger capacity. 

arc can be flown and the aircraft turned around in one time 
segment. The optimization problem is as follows: 

Minimize: Total LTO Fees 

Variables: Number of aircraft on each arc 

Resident fleet at each airport 
Airplane passenger capacity 
Constraints: Passenger demand 

Assignment continuity 
Resident fleet conservation 
Total fleet size 


The three types of variables are: Uij, the number of air- 
craft assigned to flight arc i at time segment j, v> „, the 
number of resident aircraft at airport k , and w, the pas- 
senger capacity of the airplane. The resident fleet is the 
number of airplanes at each airport at the start and end of 
the day, which must be the same to repeat the schedule the 
next day. The allowable ranges for the variables are : 


For example, for our 9-city, 20-arc case: 
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The resident fleet size Sh at each airport k must equal 
the number of airplanes SFk at the end of the day so the 
schedule can be restarted the following day. In equation 
form, we require: 


— SIk + SFk < 0 


with: 


SI k = v k 

SFk = Nk,i • u i,j final 


0 < Uij < 12 
0 < v k < 30 
w = 100, 200, 300 

The total daily LTO cost is a function of F(w) (see Ta- 
ble 1) and the number of segments flown. The non-linear 
objective function can be written as: 

min G = F(w) V’ m j ) 

u,j,v k , w y 7~? / 

There are 20 arcs and 6 time segments in this problem, 
which, with 9 airports and 1 aircraft type, results in a total 
of 130 variables. Constraints are required to ensure that 
passenger demand Di,j is met in full by capacity Cij for 
each arc, at each time segment. There are 20 arcs and 6 time 
segments, for a total of 120 passenger demand constraints. 
For these non-linear constraints to be satisfied: 


— Gij + Dij < 0 


with: 


Cij = w ■ 


While the framework supports multiple aircraft models, in 
this example problem the fleet is composed of a single air- 
craft type, for which the passenger capacity is a variable. 
Assignment continuity ensures that an airplane can only be 
assigned to an arc if an airplane is available at the origi- 
nating airport. With 9 airports and 6 time segments, 54 
continuity constraints are included. Defining S k j as the 
state of the fleet at airport k at the beginning of time incre- 
ment j, we require: 

-Skj < 0 


where: 


Sk,j = Sk,j- 1 + Mk,i ■ Uij + Nk, 


i ' — 1 


The M matrix is used to tally outbound aircraft for each 
airport during a time segment. Likewise, N is used to de- 
termine the inbound aircraft to be added to am airport pool. 


The airports in this sample problem contribute 9 resident 
fleet constraints. Finally, the total fleet size F is enforced 
using: 

S^k - F < 0 

k 

This results in a total of 184 constraints. 

Collective Intelligence and Product 
Distribution Theory 

Collective Intelligence (COIN) is a framework for design- 
ing a collective, defined as a group of agents with a specified 
world utility or system-level objective. In the case of the 
fleet assignment problem, the agents match the two types 
of variables: the number of airplanes assigned to each route 
for each time segment, and the size of the resident fleet at 
each airport. The world utility for this problem is the ob- 
jective described above: total LTO fees. 

The COIN solution process consists of the agents select- 
ing actions (a value from the variable space) and receiving 
rewards based upon their private utility functions. These 
rewards are then used by the agents to determine their next 
choice of action. The process reaches equilibrium when the 
agents can no longer improve their rewards by changing 
actions. Product Distribution (PD) theory formalizes and 
substantially extends the COIN framework. 10-12 In particu- 
lar PD theory handles constraints, a necessity for problems 
such as fleet assignment. The core insight of PD theory is to 
concentrate on how the agents update the probability distri- 
butions across their possible actions rather than specifically 
on the joint action generated by sampling those distribu- 
tions. 

PD theory can be viewed as the information-theoretic ex- 
tension of conventional full-rationality game theory to the 
case of bounded rational agents. Information theory shows 
that the equilibrium of a game played by bounded rational 
agents is the optimizer of a Lagrangian of the probabil- 
ity distribution of the agents’ joint-moves. In any game, 
bounded rational or otherwise, the agents are independent, 
with each agent i choosing its move Xi at any instant by sam- 
pling its probability distribution (mixed strategy) at that 
instant, gi(xi). Accordingly, the distribution of the joint- 
moves is a product distribution, P(x) = 5 i(x,). In this 

representation, all coupling between the agents occurs in- 
directly; it is the separate distributions of the agents {qi} 
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that “are coupled, while the actual moves of the agents are 
independent. As a result the optimization of the Lagrangian 
can be done in a completely distributed manner. 

When constraints are included, the bounded rational equi- 
librium optimizes the expected value of the world utility 
subject to those constraints. Updating the Lagrange pa- 
rameters weighting the constraints focuses the agents more 
and more on the optimal joint pure strategy. 

This approach provides a broadly applicable way to cast 
any constrained optimization problem as the equilibrating 
process of a multi-agent system, together with an efficient 
method for that equilibrating process. 

The next section reviews the game-theoretic motivation 
of PD theory. This is followed by the details of the resulting 
distributed constrained optimization algorithm used to solve 
the fleet assignment problem. 

Bounded Rational Game Theory 

In noncooperative game theory one has a set of N players. 
Each player i has its own set of allowed pure strategies. A 
mixed strategy is a distribution qi(xi) over player i’s possible 
pure strategies. 

Each player i also has a private utility function 3 , that 
maps the pure strategies adopted by all N of the players 
into the real numbers. Given mixed strategies of all the 
players, the expected utility of player i is: 

E{gi) = / dx 

J i 

In a Nash equilibrium, every player adopts the mixed 
strategy that maximizes its expected utility, given the mixed 
strategies of the other players. Nash equilibria require the 
assumption of full rationality, that is, every player t can 
calculate the strategies of the other players and its own as- 
sociated optimal distribution. 

In the absence of full rationality, the equilibrium is de- 
termined based on the information available to the players. 
Shannon realized that there is a unique real-valued quan- 
tification of the amount of syntactic information in a distri- 
bution P{y). This amount of information is the negative of 
the Shannon entropy of that distribution: 


sociated q is given by the minimizer of the Lagrangian: 

C{q)=J2^[E q (9r)-u)-S{q) ( 1 ) 


= Y^P'[f dx H q j( x ^ 9i ( x ' ) ~ e *] ( 2 ) 

i J 

where the subscript on the expectation value indicates that 
it is evaluated under distribution q, and the {0i} axe “inverse 
temperatures” 0i = 1 /Ti implicitly set by the constraints on 
the expected utilities. 

The mixed strategies minimizing the Lagrangian axe re- 
lated to each other via 


qi[Xi) oc e ’<*> 


where the overall proportionality constant for each i is set 
by normalization, and 


G(x) = ^9i( X ) 

i 

The subscript q( t ) on the expectation value indicates that 
it is evaluated according the distribution fl ,/i Qi • The ex- 
pectation is conditioned on player i making move n. In 
Eq. (3) the probability of player i choosing pure strategy Xi 
depends on the effect of that choice on the utilities of the 
other players. This reflects the fact that the prior knowledge 
concerns all the players equally. 

Focusing on the behavior of player i, consider the case of 
maximal prior knowledge. Here the actual joint-strategy of 
the players and therefore all of their expected utilities are 
known. For this case, trivially, the maxent principle says 
the “estimate” q is that joint-strategy (it being the q with 
maximal entropy that is consistent with our prior knowl- 
edge). The same conclusion holds if our prior knowledge 
also includes the expected utility of player i. 

Removing player i’s strategy from this maximal prior 
knowledge leaves the mixed strategies of all players other 
than i, together with player i’s expected utility. Now the 
prior knowledge of the other players’ mixed strategies can 
be directly incorporated into a maxent Lagrangian for each 
player, 


S(P) = - J dy P(y)ln[P(y)} 

Hence, the distribution with minimal information is the 
one that does not distinguish at all between the various y, 
i.e., the uniform distribution. Conversely, the most infor- 
mative distribution is the one that specifies a single possible 
y. Given some incomplete prior knowledge about a distribu- 
tion P{y), this says that the estimate P(y) should contain 
the minimal amount of extra information beyond that al- 
ready contained in the prior knowledge about P(y). This 
approach is called the maximum entropy (maxent) princi- 
ple and it has proven useful in domains ranging from signed 
processing to supervised learning . 13 

Now consider an external observer of a game attempting 
to determine the equilibrium, that is the joint strategy that 
will be followed by real-world players of the game. Assume 
that the observer is provided with a set of expected utilities 
for the players. The best estimate of the joint distribution q 
that generated those expected utility values, by the maxent 
principle, is the distribution with maximal entropy, subject 
to those expectation values. 

To formalize this approach, we assume a finite number of 
players and of possible strategies for each player. Also, to 
agree with convention, it is necessary to flip the sign of each 
gi so that the associated player i wants to mini m ize that 
function rather than maximize it. 

For prior knowledge consisting of the set of expected 
utilities of the players {e,}, the maxent estimate of the as- 


C t (qi) = /3 t [ct - E(g T )\ - Si{qi) 

= 0i[u~ dx )#(*)] - Si{qi) 

The solution is a set of coupled Boltzmann distributions: 

*(*i)oc e -^<‘> lfl<ix<! . (4) 

Following Nash, Brouwer’s fixed point theorem can be used 
to establish that for any non-negative values {/?}, there must 
exist at least one product distribution given by the product 
of these Boltzmann distributions (one term in the product 
for each i). 

The first term in Ci is minimized by a perfectly ratio- 
nal player. The second term is minimized by a perfectly 
irrational player, i.e., by a perfectly uniform mixed strategy 
< 7 i. So 0i in the maxent Lagrangian explicitly specifies the 
balance between the rational and irrational behavior of the 
player. In the limit, 0 — ♦ 00 , the set of q that simultane- 
ously min imiz e the Lagrangians is the same as the set of 
delta functions about the Nash equilibria of the game. The 
same is true for Eq. (3) . In fact, Eq. (3) is just a special case 
of Eq. (4), where all player’s share the same private utility, 
G. Such games are known as team games. This relationship 
reflects the fact that for this case, the difference between the 
maxent Lagrangian and the one in Eq. (2) is independent of 
q % . Due to this relationship, the guarantee of the existence 
of a solution to the set of maxent Lagrangians implies the 
existence of a solution of the form Eq. (3). 
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Optimization Approach 

Given that the agents in a multi-agent system axe 
bounded rational, if they play a team game with world 
utility G, their equilibrium will be the optimizer of G. Fur- 
thermore, if constraints are included, the equilibrium will be 
the optimizer of G subject to the constraints. The equilib- 
rium can be found by minimizing the Lagrangian in Eq. (2) 
where the prior information set is empty, e.g. for all i, 

« = m- 

Spec'ifically for the unconstrained optimization problem, 
min G(x) 

X 

assume each agent sets one component of x as that agent’s 
action. The Lagrangian Ci(qi) for each agent as a function 
of the probability distribution across its actions is, 

£ t (q t ) = E[G(x t ,x (t) )} - TS( qi ) 

= ^2qi(xi)E[G(xi,X( j))|xi] - T S(q t ) 


where G is the world utility (system objective) which de- 
pends upon the action of agent i, x t , and the actions of 
the other agents, xyy The expectation E[G(x t , xyjj'jx,] is 
evaluated according to the distributions of the agents other 
than i: 

The entropy S is given by: 

S{qi) = -T)gi(zj) lnqi{x 0 ) 

x 3 

Each agent then addresses the following local optimization 
problem, 

min Ci(qi) 

Qi 

s.t. ^<ji(x,) = l, qi(x t ) > 0,Vx, 

x i 

The Lagrangian is composed of two terms weighted by 
the temperature T : the expected reward across i’s actions, 
and the entropy associated with the probability distribu- 
tion across i’s actions. During the minimization of the 
Lagrangian, the temperature provides the means to trade-off 
exploitation of good actions (low temperature) with explo- 
ration of other possible actions (high temperature). 

The minimization of the Lagrangian is amenable to solu- 
tion using gradient descent or Newton updating since both 
the gradient and the Hessian are obtained in closed form. 
Using Newton updating and enforcing the constraint on to- 
tal probability, the following update rule is obtained: 

qt(xi) -» qi(: n) - aqi{xi)x 

| E[G \xt\- E[G\ + gfa) + lng . (ji) | (5) 

where a plays the role of a step size. The step size is required 
since the expectations result from the current probability 
distributions of all the agents. 

Constraints are included by augmenting the world utility 
with Lagrange multipliers, \y and the constraint functions, 

Cj(f), 

G(f)->G(f) + 5> iCi (x) 
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Fig. 3 Algorithm Flow Chart. 

Role of Private Utilities 

Performing the update involves a separate conditional 
expected utility for each agent. These axe estimated ei- 
ther exactly if a closed form expression is available or with 
Monte-Carlo sampling if no simple closed form exists. In 
Monte Carlo sampling the agents repeatedly and jointly 
IID (identically and independently distributed) sample their 
probability distributions to generate joint moves, and the 
associated utility values are recorded. Since accurate esti- 
mates usually require extensive sampling, the G occurring 
in each agent i’s update rule can be replaced with a private 
utility gt chosen to ensure that the Monte Carlo estimation 
of E(g 1 \x l ) has both low bias (with respect to estimating 
E(G\xi) and low variance. 14 

Intuitively bias represents the alignment between the pri- 
vate utility and world utility. With zero bias, updates which 
reduce the private utility are guaranteed to also reduce the 
world utility. It is also desirable for an agent to distinguish 
its contribution from that of the other agents: variance mea- 
sures this sensitivity. With low variance, the agents can 
perform the individual optimizations accurately without a 
large number of Monte-Caxlo samples. 

Two private utilities were selected for use in the fleet as- 
signment problem, Team Game (TG) and Wonderful Life 
Utility (WLU). These are defined as: 

gTGi(Xi, X (i )) = G(Xi,X(i)) 


gwLUi{xi,X(i)) = G(xi,x(ij) — G{CLi,x^) 


where the Cj(x) are non-negative. The update rule for the 
Lagrange multipliers is found by taking the derivative of 
the augmented Lagrangian with respect to each Lagrange 
multiplier, giving: 

*3 * j+T}E[Cj(x)\ (6) 

where rj is a separate step size. 


For the team game, the local utility is simply the world 
utility. For WLU, the local utility is the world utility minus 
the world utility with the agent action “clamped” by the 
value CLi. Here the clamping value fixes the agent action 
to its lowest probability action. Both of these utilities have 
zero bias. However, due to the subtracted term, WLU has 
much lower variance than TG. 
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Table 2 Passenger demand for each arc as a function of 
time. 


Table 3 Number of flights assigned to each arc at each 
time segment. 


Detailed Algorithm 

The algorithm used to solve the fleet assignment and 
aircraft routing problem presented here is illustrated in Fig- 
ure 3. The initialization step involves setting each agent’s 
probabilities to uniform over its possible moves. The La- 
grange multipliers for all the constraints are initialized to 
zero. A loop is then repeated until convergence. Within the 
loop the Monte-Carlo sampling is performed, after which 
the private utilities for each agent are computed. The num- 
ber of function evaluations required depends on the private 
utility. Team Game requires one function evaluation for 
each Monte-Carlo sample, while the generic version of the 
Wonderful Life utility requires as many function evaluations 
as there are variables. Often the structure of the objective 
function and constraints can be exploited in the evaluation 
of WLU to avoid unnecessary function calls. 7,9 In order 
to demonstrate the performance of the algorithm without 
any such preprocessing, the present work makes no effort 
to exploit the structure of the fleet assignment formulation 
described above. Due to the discrete nature of the agent 
moves, the regression step involves simply averaging the pri- 
vate utility received for each agent’s move over the Monte 
Carlo samples. Data aging is used within the regression to 
preserve information from previous iterations. The previous 
private utility estimates are weighted by a factor 7 compared 
with the new samples during the regression. The probabili- 
ties and Lagrange multipliers are then updated according to 
Eqs. (5) and (6), respectively. Eq. (5) automatically enforces 
the constraint on total probability but does not prevent neg- 
ative probabilities. To prevent negative probabilities the 
probability update is modified by setting all components 
that would be negative to a small positive value, typically 
1 x 10 -6 , and then re-normalizing. Finally, the convergence 
criterion is checked, in this case a combination of the norms 
of the probability and Lagrange multiplier updates: 

Up 5 + U° 5 < 1 x 1CT 4 

with: 

t' T p ^ ^ "y . (<7*(*ri) Qi (%i)pr*v ) 

i Xi 

^=E(^-Vev) 2 

3 

If the criterion is not met, the sampling and update process 
is repeated. 
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Fig. 4 The resident fleet at each airport required to 
optimally solve the problem. 

Results 

Linearized Constraints and Objective Function 

For assessment purposes, both the PD framework and 
AMPL-CPLEX 15,16 were applied to a linearized version of 
the 9-city, 20-arc fleet assignment problem (CPLEX does 
not support non-linear constraints). 

In this case, the passenger capacity w was fixed to 100. 
The linear objective becomes: 

min ^G = 100^ 

with the total fleet passenger capacity: 

Cij = 100 • Ui,y 

The problem features a time-dependant, asymmetric de- 
mand structure as shown in Table 2. 

To enhance , the convergence speed, the objective was 
squared, effectively dramatizing the topology of the prob- 
lem: 

min |G = (100V' zti , ) 2 | 

V tT J 

Both optimization tools reached global minimum with a 
fleet size of 43 aircraft: 228 flights are required, yielding 
an objective of 51,984. The number of flights assigned to 
each route is shown in Table 3, and the resident fleet size at 
each airport is illustrated in Figure 4. In order to capture 
the stochastic nature of the approach, the optimization was 
repeated 20 times. The figures show averages and ranges 
for the minimum objective in each block of Monte-Carlo 
samples. Each iteration is an update to the probability dis- 
tributions using a single block of Monte-Carlo samples. 

The importance of selecting the appropriate private utility 
is shown in Figure 5. For each utility, the best temperature 
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Fig. 5 Comparison of convergence with two private util- 
ities (200 Monte Carlo samples). 



Fig. 6 Increasing sampling speeds convergence (Tem- 
perature = 10, WLU, 20 runs). 

was selected, 10 for WLU and 1000 for TG. The results show 
that WLU performs considerably better than Team Game. 
This is consistent with previous applications of COIN. 7,9 

As illustrated in Figure 6, the number of Monte Carlo 
samples between updates affects the rate of convergence. In 
almost all cases, 50 samples were not sufficient to find the 
minimum objective. With 200 samples, the minimum was 
found in 18 of 20 cases. Increasing the number of samples 
to 1000 resulted in all cases converging to the minimum. 

Similarly, selecting the correct temperature influences the 
optimization process (Figure 7). A low temperature (T=l) 
did not allow enough exploration, while a high temperature 
(T=100) slowed convergence. For this example, a moderate 
temperature (T=10) offered the best trade-off between ex- 
ploration and exploitation. In particular, the case with the 
lowest temperature rapidly converged to an infeasible mini- 
mum. The objective then grew as the Lagrange multipliers 
increased. The optimizer, at this low temperature, is unable 
to explore other regions of the design space. 

Non-Linear Constraints and Objective Function 

Following the above linearized study, the methodology 
was applied to the fully non-linear problem: the PD frame- 
work’s ambivalence towards constraint and objective types 
makes it ideal in this case. The PD optimizer obtains a 
solution that is very close to optimum: it selects the cor- 
rect aircraft, with 200 passenger capacity (resulting in an 



Fig. 7 Effects of temperature on convergence (200 
Monte Carlo samples, WLU, 20 runs). 



Iterations 

Fig. 8 Comparison of optimum for the linearized (fixed 
airplane capacity) and non-linear (variable airplane ca- 
pacity) problems. 

LTO cost factor of 1.5) and requires a total of 146 flights 
to meet demand. The objective is therefore 47,961. Fig- 
ure 8 compares the convergence rate of the variable aircraft 
type problem with the performance of the fixed aircraft type 
problem - while the convergence takes slightly more itera- 
tions, no changes in the parameter settings were been made. 
For comparison purposes, running AMPL-CPLEX with a 
fixed passenger capacity of 200 resulted in a solution with 
142 flights required, for an objective of 45,369. When the 
PD framework is run with fixed aircraft capacity, this same 
result is obtained. 

Conclusion 

A collective-intelligence framework was successfully ap- 
plied to a sample fleet assignment problem and yielded glob- 
ally optimum solutions. With the basic framework proven 
to handle highly-constrained design spaces with non-linear 
constraints and objectives, a fleet assignment problem of 
more realistic size can be approached. The function evalu- 
ation was carefully formulated to allow for scalability and 
automation, and features such as transfer passengers, en- 
vironmental considerations, and maintenance visit require- 
ments can be implemented. Exploring other types of agents 
(perhaps airports or arcs) and developing problem-specific 
local utilities may also yield faster convergence rates and 







require fewer Monte Carlo samples. 
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